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I.  INTRODUCTION 


A.   CCNTBOL  OF  DATA 

One  problem  plaguing  toddy's  information  manager  is  the 
serious  lack  of  control  over  lata  which  has  develonel  as 
computers  and  their  applications  Live  spread  throughout 
organisations.   Hecertiy,  there  has  been  a  considerable 
increase  in  the  attention  being  paid  to  this  prorler.. 
However,  most  organizations  whose  information  systems  wcrj 
developed  in  the  60' s  and  early  to  middle  70' s  still  surfer 
the  ill  effects  cf  iirfroperly  ccntrolied  data.   In  these 
environments,  redundant,  incomplete,  and  inaccurate  data  arc- 
still  prevalent.   Under  such  circumstances,  the  probability 
that  faulty  lata  will  directly  contribute  to  poor 
organizational  planning  and  ineffective  decision -mak in g  is 
significantly  increased. 

While  scrae  organizations  have  undertaken  action  to 
correct  their  data  control  problems,  many  others  are 
overwhelmed  by  the  enormity,  complexity,  and  cost  of  the 
task.   In  very  large  organizations,  the  cost  and  complexity 
take  en  proportions  that  appear  extremely  prohibitive. 
Unfortunately,  it  is  these  larje  organizations  which  have 
the  greatest  need  for  carefully  controlled  data.   Larje 
organizations  are  also  more  likely  to  experience  adverse 
effects  which  extend  beyond  th cse  found  in  smaller 
enterprises. 

One  of  these  effects  is  manifest  in  the  helpless 
position  in  which  scire  organizational  user  groups  find 
themselves.   As  one  cog  in  a  large  wheel,  these  groups  often 
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organizational    elements   over    vhcio    they    exorcise    no    control, 
A  serious    danger    in    this   circumstance   is    the   receipt    ind 
subsequent    use    of    inaccurate    lata. 

Information    systems   need    valid    data    to    bo    effective!      A 
rash    assumption    by    a    data    processing    element    that    inaccurate 
data    are    correct    can    have    devastatinj    effects   on    a    j-arer.-. 
organiz  iticn,    especially    if    infermation   based   on    the    lata    is 
used    for   strategic    planning/decision-aaking. 

When   input    data    of    unknown    ^ability    is    being    transfered 
among    data    processing   elements    within   an    organization,     the 
problem    is    almost   always    a   systemic    one    with    deep    and 
widespread    roots.      Corrective    action    on    an    organization-vid 
basis    often    is    neglected    because   of    excessive   costs.       [Jsers 
who    find   themselves    ir.    these    situations    are    frequently    left 
to    their   own    devices,    and    they    nust    levclop    their    own    Beans 
for    validating    inputs.       An   illustration    o:    a    user    group 
experiencing   such    a    situation    is    the    Office    of    the    Deputy 
Chief    of   Staff,    Plans    (DCSP1AN3),    U.S.    Army    Military 
Personnel    Center     ( MILPERCEN)  ,     in    Alexandria,    Virginia, 


B.   DCSP1ANS,  MILPEECEN 

U.S.    Army    MILPEECEN    is   responsible    for    the    worldwide 
distribution   anl    professional    development    of    ir.ny    officer 
and    enlisted   personnel.       Within    L1ILP5RC2N,     DCSPLANS    has    t  ;.  ? 
mission    of    planning,     programming,    anl   executing    current    a:.  1 
future    force    alignment,    i.e.,    matching    personnel    inventory 
to    force   authorization    levels. 

DCSFLANS    is    composed    of    five    branches,    each    of    which 
monitors   a    specified    tortion    of    the    force    alignment    Liissior., 
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Figure  1. 1    Force  Plans  Branch 

Each  branch  uses  a  series  of  computerize!  models  to  perform 
a  variety  of  forecasting  functions.  See  Figure  1.1  for  an 
example  cf  the  models  anl  input  files  use!  by  DC3PLAN5' 
tranches.   Many  of  these  models  are  quite  complex  and  draw 
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input    data    from    both    MILPERCEN     and    non-MILPERCEN    sources. 
Some    infut    files   are    extremely    large,    feed    a    Dumber    of 
models,    and    historically,    have    teen    prone    to  error,       None   of 
the    iniut    files    are    under    DCSPLANS    control. 

The    output    of    DCSr-LAHS*     models   is    used    foe    crucial    top 
level    decision    making   which    will    determine    the   structure    and 
content    of    army    forces    well    intc    the    future.       As    such,     ^iiC 
DCSPLANS   output    must    exhibit    a    very    high    degree    of    validity. 
Currently,    however,     DCSPLANS    is    unable    to    verify    the 
accuracy   of   much    of    the    input    lata    being    used    by    its    models. 
Thus,    despite    the    correctness    of    the    models    themselves,    the 
reliability    of    the    DCSPLANS    product    must    be    considered 
do  abt ful. 

DCSPLANS   officials   are    quite   concerned    about    their 
present    inability    to    insure    tnat    the    lata    used    in    their 
models    are   accurate.      They    realize    the    problem    will    not    be 
solved    for    them    soon    by    the    organization     (MILPEECEN) ,    and 
that    they    must    devise    their    own    local    solution.       \    variety 
cf    options    are    available    to    their.       Some   are    juite    poor 
(e.g.,    Hiaintain    the    status    quo    and    rely    on    trie    input    lata 
sources    to    insure    validity) ;    others    ire    more    feasible,     rut 
still    contain    serious   shortcorr.ir js     (e.  j.  ,     update/convert 
every    DCSPLANS    model    to    include   its    own    validation    process) . 
A    much    more   effective   an  1    efficient    alternative    is    lescrire] 
in    this    paper,     i.e.,    the    use    of    an    active    data    dictionary    as 
a    "filter"    to    validate   input    before    the    lata    is    processed    by 
the    various   models. 


C.        THESIS    METHODOLOGY 

This    thesis    will    explore    the    concept    of    usir.j    an    active 
data    dictionary    as   a    local    validation    tool.       It    will    proceed 
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from  a  general  review  of  data  validation,  through  ar. 
examination  of  data  dictionaries  and  their  design,  to  ir. 
illustration  of  how  an  active  data  dictionary  can  be 
beneficially  applied  to  DCSPLANS  operations. 

Chapter  Two  of  the  thesis  cites  the  essential  role  of 
data  validation  as  an  integral  part  of  a  data  processing 
system.   Validation  criteria  and  techniques  used  in  the 
"data  filter"  are  reviewed,  and  the  general  nature  of  edit 
and  validation  rules  is  introduced. 

Chapter  Three  explores  the  data  dictionary.  It  includes 
some  lasic  definitions  and  concepts,  and  specifically 
addressee  how  an  active  lata  dictionary  is  usel  to  validate 
da  t  a . 

Chapter  Four  outlines  an  approach  to  "local"  initial 
design  of  a  data  dictionary  "filter"  system.   This  chapter 
also  includes  a  recommended  "checklist"  of  guest  ions  i  user 
group  can  ask  to  define  its  own  data  dictionary/validation 
requirements  and  system  structure. 

Chapter  Five  specifically  addresses  the  DCSPLANS 
situation.   It  cites  a  proposed  goal  and  some  .key  objectives 
of  a  DCSPLANS  validation  system,  and  uses  a  modified 
structure  diagram  of  a  "data  filter"  to  illustrate  the 
recommended  approach  to  DCSPLANS'  lata  validation  diieai;,a. 

Chapter  Six  summarizes  the  results  of  this  thesis. 
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II.  INPUT  VALIDATION 


A.   GENERAL  DESCRIPTION 

Inaccurate  data  items  can  easily  find  their  way  into 
master  files  and  databases,  either  through  direct  input  I 
users  or  through  impicper  processing  actions  by  application 
programs.   Regardless  of  origin,  inaccurate  data  are  poison 
in  any  AEP  system.   Information  create. 1  from  inaccurate  lata 
also  tends  to  be  inaccurate,  and  decisions  based  upon  such 
information  are  counterproductive  to  organizational  -;ouis  in 
almost  every  instance.   lata  is  a  valuable  resource,  and  its 
accuracy  is  crucial  tc  or jan iza tional  success. 

Validation  is  that  set  of  actions  which  attempts  to 
preclude  the  existence  of  inaccurate  data  within  a  r. 
information  system.   Validation  tests  c\:i    be  implemented  at 
any  number  of  stages  within  the  data  processing  cycle: 
prior  tc  input,  upon  input,  during  L  recessing,  and  after 
processing  (output  checks).   "Icput  validation",  as 
implemented  by  an  active  data  dictionary  system  ,  occurs  it 
the  second  stage. 

Input  validation  focuses  specifically  on  data  being 
entered  into  a  systea.   Its  aim  is  to  letect  errors  and 
thereby  insure  the  initial  accuracy  of  tho  master  file  :/; 
database  being  constructed/updated.   [Ref.  1:p.  326]  During 
input  validation,  checks  are  corducteu  to  insure  that  t 
input/update  operation  itself  is  legal,  and  that  input  lata 
does  not  violate  prescribed  accuracy  constraints.   Creation 
of  a  new  file  or  the  update  of  an  existing  one  is  a 
processing  stage  that  demands  extremely  careful  data 
validation,  especially  in  those  cas*;s  ihere  the  input  data 
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is  received  from  sources  outside  the  control  of  the 
processing  element.   Fortunately,  it  is  at  this  stage  that 
the  accuracy  of  data  can  be  checked  most  effectively 
[Eef.  2:p.  239].   One  additional  caution  which  mast  be 
mentioned  at  this  point  is  that  data  does  not  become 
inaccurate  from  entry  errors  alcne.   Data  may  be  inaccurate 
simply  because  it  is  eld!   Previously  accurate  values  may  no 
longer  be  correct  because  available  new  values  have  not 
superseded  older  values  due  to  reelected  updates. 
Validation  processes  also  must  check  for  these  types  of 
inaccuracies. 


B.   VALIDATION  TECHNIQUES 
1 .   Category 

The  general  category  of  input  validation  techniques 
used  ty  the  "data  filter"  being  proposed  examines  input  lata 
in  the-  exact  form  in  which  it  arrives  for  processing.   The 
techniques  involve!  detect  errors  by  checking  the 
"acceptability"  of  both  the  data  transactions  and  the  data 
itself.   This  checking  is  accomplished  through  a  series  of 
programmed  instructions/rules,  and  is  implemented  very 
effectively  by  an  active  data  dictionary  system.   Throe 
hasic  techniques  are  included  in  the  category:   transaction 
validation,  format  checks,  and  reasonableness  checks.   a 
well  designed  validation  projran  includes  a  combination  of 
all  three.   [Eef.  3:p.  248] 

The  transaction  validation  technique  is  used  to 
verify  the  legitimacy  of  transactions  wnich  input  data.   The 
format  checks  and  reasonableness  checks,  on  the  other  hand, 
are  used  to  examine  the  correctness  of  data  items 
themselves.   In  order  to  Eacilitate  a  clearer  picture  of  the 
"data  filter"  design  which  will  be  presented  in  the  next  two 
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chapters ,    a    brief    description    cf    the    throe    validation 
methods    is    provided    below. 


2  •      2£i|nsac t ion    Val idat  ion 

Transaction    valuation    should   be   the   first    technique 
to    be    applied.       It    certifies    that    "    a   specific    transacti::. 
is    one    that   can    be    processed    by    the    system    and    is   heir,  j 
submitted    properly."      [  Ref  .    4: p.    218]   Its    focus    is    the 
verification    that    the   type    and    purpose    of    the    transaction 
are    legitimate    processinj    actions,    anl    that    the    originator 
of    the    transaction    has    the   authcrity    to    initiate    it. 
Transactions   determined   to   be    iraccurate    ire   rejected- 
Related    validation   checks    which    also   must    be 
conducted    during    this    juncture    cf    the    processinj    cycle    are 
checks    for    sequential   dependencies   and/or    proper    timing. 
For    example,    a    Mont  h  ly_Pej.ort    transaction    may   not    be    able    to 
take    place    until    Monthly_Update    transactions    are 
successfully   executed. 

The   role    of    transaction    validation    vis   a    "first    step" 
stems    frcm    the    potential    lama-je    which    could    be   inflicted 
upon    a    system    t -j    the    processinj    of    an    invalid    transaction. 
Even    if    the   invalid    transaction    is    subsequently    iiscovered, 
recovery    may    prove    extremely    difficult.       An    ounce    of 
prevention,    in    this    case,    is    certainly    worth    a    pound    of 
cure  ! 

Cnce    transaction    validity    is    established,     the    irpjt 
data    itself    is    examined    through    a    series   of    format    checVs 
and    reasonableness    checks. 

Format    checks   compare    the    actual    contents    of    a    field 
to   a    pre-set    series    cf    user-defined    rules.       A    record,    i»hose 
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contents  fail  to  conform  to  the  prescribed  format,  either  is 
rejected  outright  cr  transferred  to  an  appropriate  error 
handling  routine.   Some  of  the  irore  common  format  checks 
are: 

a)  Length  Checks:   used  to  verify  that  a  field  contains  a 
prescribed  minimum,  maximum,  or  fixed  amount  of 
characters. 

b)  Character  Type  Checks:   used  to  verify  that  a  field 
contains  only  scecificall  )  authorized  value  types, 
i.e.,  numerics  only,  alphabetics  only,  blanks,  or 
special  characters. 

c)  Character  Pattern  Checks:   used  to  verify  that  the 
contents  of  a  field  match  a  prescribed  pattern  of 
alphabetics,  numerics,  cashes,  etc. 

d)  Date  Checks:     used  to  insure  that  the  contents  of  a 
date  field  are  entered  in  the  required,  standard 
format,  i.e.,  YIH.MDD  or  Y7DDD. 

^  •   £§ii§2£§:i2!£ness  Checks 

Reasonableness  chocks  test  data  items  to  insure  that 
data  values  fail  within  the  limits  of  established 
constraints.   These  constraints  are  separated  into  three 
hasic  types.   Field  constraints  limit  the  value  of  a  cjiven 
data  item.   Intrarecord  constraints  limit  values  between 
fields  in  the  same  record.   Interrecord  constraints  limit 
values  between  fields  in  different  records.   [Ref-  5:  p. 
179]  Reasonableness  checks  based  upon  field  constraints  are 
fairly  straightforward  in  design  and  application. 
Intrarecord  and  interrecord  constraint  checks,  however,  leal 
with  logical  accuracy  and  the  icterrelationships  among  data 
items.   As  such,  they  are  much  nore  difficult  to  develop  and 
manage.   Common  reasonableness  checks  are: 
a)  Field  Constraints 
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-  Ran je    Checks    -    used    to    verify    that    the    fiell    value 
falls    within    a   specified   range,    i.e.,    the    value    *oe: 
not   violate    an    upper    or    lower    Limit. 

-  Sequence    Checks    -    used    to    test    a   specially    created 
field    to    insure   records  are    processed   in    the    proper 
order.       These  checks    are   also    used    to    verify    the 
presence    of    all   required    records. 

-  Completeness    Checks    -    used    to   confirm    that    each 
mandatory    field    in   a    record    is    filled    with    a    lata 
item    of    some    prescribed   size. 

-  Date   Checks    -    used    to    verify    that    the   contents    of    a 
cute    field    dc   not    violate    earliest    or    latest 
acceptable    date    restrictions. 

-  Code    Checks    -    used    to    verify    that    the   contents    of   a 
code   field   are   contained    within    a    Listing    of    valid 
and   current    cedes. 

b)     Intrarecord    and  Interrecord   Constraints: 

-  Completeness    Checks    -    used    to    identify    those    fields 
in    a    record    which    must    to    filled    basel    upon    the 
contents    of    ether    fields    in    that    record 

(intrarecord)    or   other    recoris    (interrecord). 

-  Consistency    Checks    -    used    to    verify    that    the    values 
in    certain    fiells   are    valid    in    relation    to    the     iata 
values    of    btfcer    fields     (either    in    the   sa.iie    recced    o: 
ct  her    record s) . 

An    example    of   an    intrarecorl    completeness    ci.cct    is, 
"if    the    Conversion    Indicator    field   in   a    record    is    filled, 
then    the   Conversion    Cede    field    in    that    record    must    also   be 
filled."      An    interrecord    version   of    a    completeness    check    is 


as  follows 


if  the  VE3  Multiplier  £iel< 


filled  for  any 


record  in  this  run,  then  all  VRE  Multiplier  fields  must  ee 
filled. " 

An  example  of  an  intrarecord  consistency  check  is, 
"If  the  PCS  rode  in  a  record  is  63H,  then  the  grade  value  in 
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the  record  must  be  either  FA    or  E5-"   An  interrecord 
consistency  check  is  "no  SSN  field  value  may  be  the  saire  as 
the  SSN  field  value  cf  another  record." 

It  is  also  possible  to  have  "interfile" 
dependencies,  e.g.,  a  record  with  an  SSN"  field  value  of 
"9999939"  in  file  "A"  must  have  the  same  MOS  field  value  as 
a  record  in  file  "3"  which  has  an  identical  SSN  field  value 
of  "9999999." 


C.   EDIT  AND  VALIDATION  BOLES 

There  must  be  an  organized  and  consistent  method  fcr 
applying  the  validation  checks  cited  above  to  data  being 
input  into  an  information  systeit.   The  vehicle  for  this 
application  is  the  edit  and  validation  rule  (EVR) .   EVRs  ar 2 
explicit  statements  of  constraints  about  the  lata  in  a 
system.   These  rules  monitor  the  basic  structure  and 
relationships  of  data  items,  and  enforce  processing 
restrictions  established  by  the  information  manager. 
[Ref.  6:  p.  146] 

Two  key  issues  cencerninj  EVIis  must  be  addressed  when. 
build  in j  a  lata  validation  system.   The  first  is  how  to 
properly  develop  consistent  rules.   Consistent  rules  [.remote 
accurate  data,  whereas  contradictory  rules  produce  an 
unreliable  lata  syste.n  that  eventually  will  crash. 

(Definition  and  de  ve  lopsient  of  EVbs  will  be  covered  in 
chapter  four  as  an  integral  part  of  the  overall  "lata 
filter"  design  process). 

The  second  key  issue  is  where  to  place  an  EVR  module, 

(i.e.,  is  it  better  to  embed  it  as  part  of  an  application 
program,  or  is  it  better  to  mike  it  a  separate  validation 
program?).   The  use  of  an  active  data  dictionary  as    a  "data 
filter"  argues  for  the  latter  approach.   The  rationale  for 
SHCH  A  PLACEMENT  IS  SIT  FORTH  Ih    THE  NEXT  CHAPTER. 
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III.  DATA  DICTIONARY  AS  "DATA  FILTER" 


A-   BASIC  CONCEPTS 

Four  basic  concepts  are  central  to  a  clear  understand!] 

of  how  a  data  dictionary  can  be  used  locally  to  validate 
data  maintained  and  provided  Ly  other  sources.   These  ar.: 
Data  Dictionary,  Metadata,  "Active"  Data  Dictionary,  and 
Data  Extraction. 


1  .   Data  Die ti onary 

A  3ata  dictionary  is  a  centralized  repository  cf  all 
definitive  information  about  the  relevant  lata  in  an 
enterprise.   The  data  dictionary  provides  the  user  a 
description  of  what  data  exists,  what  it  looks  like,  and 
what  it  means.   [Ref.  7:p.  1]  A  data  dictionary  can  be  as 
simple  as  a  manual  cataloj  system  or  as  complex  as  i  r. 
automated  set  of  programs  which  controls  a  wide  range  o  1    tl 
enterprise's  data  processing  operations. 

2.   Metadata 

The   real    world   of    an    enterprise   contains    a    number   o.r 
data    objects    (entities)     which    are    represented    in    the 
enterprise's    information    system   as   data    elements,     records 
and    files.       For    example,    customers     (entity)     arc    represented 
Ly    i    set    of    data    elements/fields     (CU5T_ID,    CUST_NAHF,    etc.) 
which   comprise    records    (CTJST_7,FC)  ,    which,    in.    turn,    are 
grouped    into    files     (CU3T_FILE)  .       The    data    use":    t)    defir-     and 
describe    these    entities   are    called    metadata,    i.e.,     lata 
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about  the  data.   Metadata  are  stored  in  the  data  dictionary, 
forming  a  metadata  database  or  iretadatahase.   [Bef.  8:p.  9] 
Dictionary  metadata  contain  the  characteristics  of  each  data 
object.   The  Metadata  answer  the  following  questions: 

a)  What  data  is  available  in  the  enterprise? 

b)  What  does  the  data  mean? 

c)  How  is  the  data  structured? 

d)  What  constraints  and  relationships  exist?   Typically, 
dictionary  metadata  include:   object  name,  short  name, 
synonym  or  aliases,  source,  narrative  description, 
records/files  that  use  cr  contain  the  lata  object,  data 
structure/format,  integrity  constraints  (e.g.,  value  range), 
and  relationships/dependencies.   [Ref.  9:p.  13]  Metadata  ire 
essential  ingredients  in  the  validation  of  data  by  a  data 
dictionary  system. 

3.   "Active"  Data  Dictionary 

There  are  two  basic  modes  in  which  a  lata  dictionary 
can  function:   passive  or  active.   A  passive  data  dictionary 
merely  registers  the  metadata  ard  provides  the  user  a 
facility  for  interactive  ^uery  and/or  report  generation.   It 
does  net  reguire  that  lata  processing  operations  depend  jtor: 
it  for  metadata,  and  no  direct  link  is  maintained  between 
the  passive  lata  dictionary  and  other  system  components. 
(See  Figure  3.1)   In  fact,  application  programs  and 
processes  may  obtain  their  metadata  entirely  from  other 
sources. 

An  active  data  dictionary,  on  the  otaer  hand, 
exercises  a  great  deal  of  confrcl  over  processing  and 
metadata  usage  within  an  information  system.   A  lata 
dictionary  is  said  to  be  active  with  respect  to  an 
information  system,  if,  and  or.l}  if,  that  system  is 
dependent  upon  the  data  dictionary  for  its  metadata.  (See 
Figure  3.2) 
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Figure    3.1         Passive    Dictionary 

A    lictionary    is   active    t:)   a    lesser    decree    when    only 
some    of    the   system's    rrojrams    and    processes    are    le pendent 
upon    it    for    metadata.      The    more    programs    or    processes    that 
rely    on    the    .lictionary,    the    more    active    it    is   said    tc    be. 
[Hef.     10:p.    22]    The    value    of    an    active    lata    lictionary    stems 
from    the   establishment   of    mandatory    interfaces    between    it 
and    various   system    p recesses,       Vhen    the    data     lictionary    is 
used    as    a    "data    filter" ,    these    mandatory    interfaces    will 
insure    that    input    data   conform    to    pre-defined   rules    an  1 
standards. 
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NOTE:  The  "processed  lata"  tiock  shown  above  includes 
metadata  and  all  programs  used  by  the  data  dictionary 


Figure  3.2    Active  Dictionary 


Data  Extraction 


Data  extraction  is  a  technique  whereby  a  subset  of 
data  frcm  a  very  large  file  system  or  database  is  transferee 
to  a  much  smaller  "extracted''  file  or  iatanase.   The  data 
extraction  process  can  be  either  quite  simple  or  very 
complex.   A  complex  lata  extraction  process  is  designed  to 
collect,  format,  and  integrate  data  from  a  number  of  source 
files/d ita bases  into  a  single  data  source  whose  contents  are 
specifically  tailored  to  the  needs  of  a  single  user  or  group 
of  users.   Such  a  system  involves  extensive  dati 


description,  subsetting,  aggregation,  a::  I  presentation 
operations.   [Ref.  11:p.  245]  This  thesis  addresses  3ata 
extraction  from  a  much  simpler  perspective,  i.e.,  as  a  iean.3 
to  limit  the  size  of  the  data  tc  be  validate!  by  the  lata 
dictionary.   In  most  cases,  user  applications  do  not  need 
all  data  contained  in  a  large  lata  source.   rhus,  the 
extraction  of  only  pertinent  data  (a  much  scalier  subset) , 
usually  serves  to  increase  the  speed  of  application  programs 
acting  upon  the  data.   Such  data  extraction  operations  car. 
Le  used  to  greatly  enhance  the  efficiency  of  the  propose! 
"data  filter"  when  large  source  files  arc  involved*   A 
diagram  of  a  simple  lata  extraction  lesign  vuich  can  be  use-' 
in  conjunction  with  a  data  dictionary  "filter"  is  shown  in 
Fijure  3.3. 

Throughout  the  remainder  of  this  thesis,  th«  term 
"data  filter"  v.  ill  refer  to  the  active  data  dictionary 
validation  system  being  proposed. 


B.   CONFIGURATION 

1  •   *££^ila  ta  Generation 

The  key  to  constructing  the  data  filter  is 
incorporating  into  a  data  dictionary  the  capability  to 
generate  the  metadata  needed  by  a  system's  edit  anl 
validation  software.   The  metadata  generation  is  triggered 
by  the  edit  and  validation  software  through  the  issuance  of 
commands  and  applicable  parameters.   The  data  filter  must  be 
designed  so  that  the  edit  and  validation,  with  its  mandatory 
call  for  metadata  generation,  is  automatically  activated 
during  ail  lata  input  operations.   The  resulting  metal  it  a 
generation  produces  data  descriptions  based  upon  the 
characteristics  stored  in  the  jjta  dictionary  me tadat abase. 
These  data  descriptions  are  transformed  iato  specific  edit 
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Figure    3.3         Data    Extraction    Design 

and    validation    rales     (EVE)     for    use    by    the    edit    and 
validation    programs.      [Ref.     12:  p.     116] 

2.       Edit   and    ValJ.dat.ion    Programs 

Edit    and    validation    programs    are    separate    rrom    the 
application    programs    which   enter    the    lata    into    the    system. 
They    cannot    be    executed    without    data    dictionary    metadata     (in 
the    form   of    EVR)     through    which    they    will    filter    ail    incoming 
data.       These    programs   are    usually    general    purpose    in    nature. 
The    tailoring    of    the    programs    tc    specific    types    oz    data    is 
accomplished    through    the    EVR    provided   by    the    active    data 
dictionary.       For    example,    an    BMP    data    entry    operation    wiil 
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result    in    different    HVR    being    passed    to    an    edit    and 
validation    program    than   will   a    EiAH    data    entry    (ov!.\D   data 
may    be   coi\vos.2u    of    totally    dissimilar   data    objects    than   l."V 
data,     and    may    also    involve    very    different    validation 
criteria).       Various    edit    and    validation    urograms    car.    be 
incorporated    into    the  data    filter    to    icco^jiodito    listinct 
categories   of    data   entry    operations,    e*g.,     updates, 
deletions,    creation    of    new    files,    etc. 

3.      General    Desijn 

Figure    3.U    depicts    a    jereralized    data    filter    design. 
The    lata    dictionary    generates    metadata    based    upon    cc::h[.!z 
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Figure    3.4         General    Eata    Filter    Design 
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from    the  edit    and    validation    jcgrai.      Then,    the    metadata   is 
transformed    into    SVR    which    are    fed    tack,    into    the    edit    and 
validation    program.       The    edit    ard    validation    program 
"filters"    incoming    data    through    the    EVE    during    the    edit    and 
validation    process.       "Correct"    data    is   moved    to    the 
appropriate   storage    area,    and    erroneous    data    is    either 
rejected   outright    or    sent    to    an    error    file    for   future 
editing    and    resubmission. 
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Figure    3.5         The    Data    Filter    System 


Figure    3.5    shews    the    ecnplete    data    filter    system 
with    a    data   extraction   module    added.       This    configuration 


increases  data  validation  efficiency  by  reducing  the  amount 
of  lata  to  be  "filtered."   In  the  DCSPLANS '  case,  iue  to  the 
enormity  of  the  SMF  and  some  otter  soured  dati  files,  the 
time  saved  becomes  juite  significant. 


C.   ADVANTAGES 

Almost  <ili  data  editing  ani  validation  systems  provide 
the  user  a  capabilty  to  validate  and  edit  data,  and  to 
correct  and  report  erroneous  data.   There  are,  however, 
added  benefits  to  be  gained  by  using  the  active  data 
dictionary  approach  which  for:Tis  the  basis  of  the  data  filter 
configuration  described  above. 

First,  since  the  active  data  dictionary  becomes  the 
sole  source  of  metadata  for  all  edit  and  validation 
processes,  redundant  metadata  is  eliminated  and  metadata 
consistency  is  promoted.   In  essence,  a  much  qreitcr  degree 
of  control  over  metadata  is  realized,  and,  as  a  result, 
regulated,  consistent  validation  of  lata  is  achieved. 

Second,  the  data  dictionary  affords  the  user  a  very 
flexible  and  easily  adjustable  validation  mechanism. 
Changes  in  data  and  revisions  tc  validation  criteria  ic  not 
require  modification  of  application  programs  or  edit  and 
validation  programs.   Instead,  changes  are  easily 
accommodate  1  by  simple  adjustments  to  metadata/EVR. 

Third,  should  the  information  system  involved  be 
file-based  (as  is  the  case  with  DCSPLANS),  the  lata 
dictionary  approach  is  an  invaluable  "bridge"  r.  ar  a  fut  i 
transition  to  a  database  system.   Fase  of  transition  is 
promoted  i y  already  having  in  existence  an  organized, 
centralized  store  of  the  enterprise's  aetada*-i. 

One  ether  benefit  of  the  proposed  data  fiiter  system 
stems  fLom  the  separation  of  the  data  extraction  program 
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from    the    actual    edit    and    validation    activities.       Met    only    is 
overall    validation    speed    increased,    bat    also    the    user    now 
has    the    option,    in    exigent    circumstances,     to    forejo 
validation   entirely    if    time    constraints   demand    such    action. 
An    interdependent    extraction/validation    process    wouii    net 
allow    this   alternative. 
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IV.     PLANNING    AND     GENERAL    DESIGN 


A.        KEY    DEVELOPMENT     PHASES 


A    software    product's   ability    to    do    what    it    is    supposed 
to   do   efficiently    is    largely    governed   by    the   quality   of    the 
detailed   design   and   ceding    that   creates    it.      In    turn, 
successful    detailed    design    and    coding    ire    iirectly    tied    tc 
the    quality    of    initial    planning    and    design    activities. 
Thus,    the   planning    and    preliminary    design    steps    taken    by 
users    to   develop    a    local    data    filter    are    crucial,    an!    rust 
be   comprehensively   and   carefully   accomplished. 

Planning   and    initial   design   of    a    data    filter    is    a    thr:cc 
phased    process.       Phase    one    describes    the    system's 
environment    and    general   characteristics.       Phase    two    develops 
data    definitions    and    validation   criteria.       Phase    three 
produces   an    initial    logical    design   of    the   system.       A 
description    of    each    of    these    phases    is    presented    below, 
along    with   a    "checklist"    of    relevant    questions    which    serves 
as    a    guide    for    proceeding    through    the    phase. 

The    checklists    fern    a    framework    within    which 
users/developers   can    methodically    develop    the   data    fii+':. 
The    framework   assists   then    in: 

1.  Obtaining    a    clear,    comprehensive    picture    of    t 
environment    in    which    the    data    filter    will    function. 

2.  Identifying  and  defining  the  da t 1  to  be  validated, 
and  determining  the  nature  and  sco^e  of  validation 
required. 

3.  Constructing  well-defined,  functionally  structure  ' 
validation    and    EVE    modules. 
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E.   PHASE  ONE  -  SYSTEH  ENVIRONMENT/GENERAL  CHARACTEFISTICS 

1  .   De script  ion 

This    phase    identifies    all    hardware    and    Jirrware 
being    used    (or    projected    for    use)    in    the   overall    information 
system,    and   describes  its   environment    (e.g.,    distributee    vs. 
centralized   system,     file    system    vs.     database   system,    etc.). 
It    notes   validation   capabilities    already    built    into    the 
systeir,    and   also    identifies    comnercial    validation 
capabilities    which    are   compatible    with    existing    hardware    in  J 
firmware. 

Phase   one    alsc    uncovers    the    general    nature    of    the 
input    data    to    be   validated.       It    identifies    the    broad 
categories    of    input    data,    examines    data    stability    an  3 
consistency,    and    looks   at    who    exercises    control    over    the 
entry    of    data    into    the    system.       This    phase    outlines    data 
entry    methods    and    notes   the    various    processing    stages    at 
which    data    validation    may    occur     (pre-input,    luring    input, 
etc.).       An    overview    of   system    output    is    also    formulated. 
The    level   of  accuracy   required    for    tha   output   is 
established,    and   the    degree    to    v*hich    output    validity    is 
dependent    upon    valid    input    is    determine!. 

2.      Checklist 


Answers  to  the  following  questions  will  provide  a 
clear  picture  of  the  cverali  system,  including  inputs  ar.d 
outputs : 

a)  What  major  hardware  components  comprise  the  system? 

b)  What  operating  system  is  used? 

c)  What  validation  capabilities  are  already  built  into 
the  system  hardware/firmware? 

d)  Are  there  currently  any  plans  to  change/expand  nsajo: 
system,  hardware? 
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Are   any    systeir-compa tible    data    validation    products 
currently    available    (either    in-house    or  commercially)? 
what   system-compatible   data    lictionary    software    i: 
currently    available    (either    in-house    or  commercially)': 

j)     Are    we  dealing    with    a    file-based   or    database    Sj-tf:: 
What    portions    of    the    information   system   are 
distributed? 

Hew    stable   are    system    inputs?       (i.e.,    Are    different 
data    elements,    records    anc    files    ailed    or    deleted    jr.    i 
frequent    basis?) 

Are    data    definitions   and    jararaeters    changed 
frequently? 

Are    we   dealing    with    a    stalle    number    of    lata    elements 
which   will    retain   stable    attributes? 

1)     Is    input    processed    in    a    batch    moi^,    on-line,    or    both? 
Is   any    pre-input    validation    conducted?      Describe! 
Is   any    output    validation    conducted?       Describe! 
TThat    are    the    sources    of    input    data?      Identify    ail 
input    files    an  3    the    applications    for    which    they 
proviie    data. 

What    degree    of    control    over    the   entry    and    update    of 
input   data    is    exercise!    i\    system    users1 
7rcr.    what    locations,    and    ly    when,    can    lata    be    added, 
changed   or    deleted. 

What    sources    beyond    the    user's    control    prevxd..-    input 
]ata?      Identify    the    lata    provided    by    eujh    of    these 
outside    sources. 

s)     Hew   often    is    data    entered?      Updated? 

t)     How    is    the    processed    .lata    being    used?       (A    general 

description,    e.g.,    report    generation,    modeling,    etc.) 
For    each    application,     report,    etc.,     Low   critical     Ls 
validity?     (i.e.,    What    ire   the    consequences    of 
inaccurate    outputs?) 


C.        PHASE    TWO    -    DATA    DEFINITION/VALIDATION    CRITERIA 
1 .       Description 

This   phase    identifies    ard    defines    the  system's    data 
entities.      Tor    the    purpose    of    the    data    filter,    data    entities 
include    all    data    elements    entered    into    the    systerc    and    the 
records    and    files    which    contain    them.      The    applications 
which    use/process    these    entities    are   also    established. 

Phase    two   alsc   sets    forth    all    validation    checks 
required.       Data    element    characteristics    such    as    description , 
range,    type,    size,    sequence,    etc.    are    recordel,    and    ail 
entity    relationships    are    carefully    delineated.       The 
information    developed   during    this    phase    forms    the    data 
dictionary    metadatabase,    and    is    used    to    construct    the 
system's   EVF    and    validation    program    modules. 

Answers    to    the   questions    listed    below    will    enable 
the    user/developer    to   identify,    describe,     and   determine    the 
interrelationships    of    all    systeir    entities.       'do   will    alsc    be 
able    to    establish    validation    criteria    for    ea:h   entity    and 
cross-reference    them    to    the    applications    which    require    that 
such    validation    occur. 

a)  What    data    elements    does    the    system    contain? 

b)  What    record  (s)     contain    these    data    elements? 

c)  What    rile(s)     certain    these    records? 
I)     Fcr   each    application     (model): 

-  Which    files    feed    it    data?      Whicn    records? 

-  Which    data    elements    does    it    usa/process? 

-  Which    data    elements    must    be    validated    (i.e.,     dees 
the    validity    cf    the    application's    output    depend    on 
this   input    data    element    being    valid)? 

-  Is   a    specific   sequence    cf    lata    entry    required? 
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-  What   pre-entry    upda t es/ transactions    must    occur,    if 
any? 

e)    For   each    data    element: 

-  What   is    its    name?    Any    Synonyms    or    aliases? 

-  what    is   its    Short   Name/ Programming    Name? 

-  What    is    its    IE*? 

-  What    is    its    character    t }pe    (alpha,    numeric,    etc.)  ? 

-  What    minimum    and    maxim u a    number    of    characters    are 
allowed?d 

-  What    numeric    value    range   applies? 

-  What    character    pattern    is    used    (e.g.,    CCC-NNK-CC)  "2 

-  Is    there   a    minimum/maxiimm    range    of    aliDvatle    change 
from   one   update    to    the    rext? 

-  What,    cause   and    effect    relationships   exist    with    ether 
data    elements?      In    the    same    record/file,     in    ether 
records/files?       (e.g.,     If    "A"   is   changed,    then    "3" 
must    be    chanced)  . 

-  Is    a   particular    update    sequence   required? 

-  Do    date    fields    have    any    earliest    or    latest    date 
limits? 

-  Do    date    fields    require     a    special    format     (e.y. 
YYHHDD) ? 

-  What    direct    relationships   exist    with    other    data 
items?       (e.g.,    value    of    "A"    must    always    Le    twice 
that    of    "3")  . 

-  Is    the    data    element    a    cede    or   a    value   that    be 
checked    against    a    table    or    listing    of    valid    codes    or 
values? 


D. 


PHASE  THREE  -  INITIAL  LOGICAL  DESIGN 
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1 .  Description 

Phase  three  produces  a  irodel  of  the  logical 
structure  of  the  lata  filter  system  which  later  will  be 
"built"  (during  coding  and  testing).   Since  it  forms  the 
basis  for  ail  further  design  steps  and  refinements,  this 
preliminary  logical  design  is  the  key  step  in  the  data 
filter  design  process.   The  data  filter  structure  developed 
during  this  phase  is  based  upon  the  general  filter  desi jn 
cited  in  chapter  three  and  the  system  environment  and 
data/validation  information  gathered  luring  phases  one  and 
two. 

Phase  three  gives  the  user  a  description  of  the  data 
filter  system  goal  and  objectives,  and  presents  the  major 
system  functions.   These  major  functions  are  then  decompose! 
into  sub-functions  until  a  series  of  sinjie,  independent 
modules  have  been  identified.   This  overall  system 
architecture  is  depicted  in  a  hierarchical  structure  diajram 
(See  Figure  4.  1)  accompanied  ly  narrative  descriptions  of 
the  modules. 

2 .  Checklist 

Answers    to    the    following    questions    will    enable    the 
user/developer    to    produce    the    information    described    ahov.->: 

a)  What    is    the    goal    of    the    system?       (State    the    general 
long-term    desired    effect). 

b)  What  are  the  system's  key  objectives?  (Enumerate  the 
critical  milestones  to  be  accomplisued  to  satisfy  t re 
state!    system    gcal)  . 

c)  What    are    the    system's    majcr   functions?       (List    the 
general    processing    activities    required    to    meet    system 
objectives).       For   example,    a    bank's    checking    account 
system    may    have    four    major   system    functions:       (1) 
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Figure  4.  1    Structure  Diagram 

performing  account  administration  (open  accts. ,  close 
accts,  etc.)    (2)   processing  deposits,  (3)  processing 
withdrawals,  {H)      maintaining  an  account  transaction 
d  atabase. 
1)  What  nolules  (sub-f unctions)  comprise  each  of  the 

system's  major  functions?   (Licit  to  no  more  than  3-5 
modules  per  function,  and  repeat  the  process  level  by 
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level   until    nc    further    module    decomposition    is 
necessary,    i.e.,    simple,     independent    modules    have    been 
created)  . 
e)     What    does    each    system    module    do?     (Give    a    precise, 

concise    description    of    approximately    two   sentences). 

3 .       Follow -on    De s ian 

Cnce   the    above    phases    have    been    completed    and 
carefully    documented,    the   data    filter   structure    has    been 
tailored    to    the    user's   specific   environment    and    validation 
needs.       Subsequent    development    involving    detailed    design 
(data    ."lows,    data    stores,    interfaces,    etc.),    coding, 
testing,    etc.    can    follow    using     cne    of    a    number    of    applicable 
methodologies    which    currently    exist. 
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V.     THE    DCSPLANS     "DATA    FILTER"    SYSTEM 

This   chapter   specifically    addresses    the    DCSPLANS'    "data 
filter"   system.       it    provides   a    statement    oi    the    system's 
overall    joal   anvl    its    Key    objectives.       It    als)  expands    t 
general    data    filter    design    provided    in    chapter   three    into   a 
more    detailed    hierarchical   design    structure    tailored    to    the 
DCSPLANS    situation. 


A.        DCSPLANS    SYSTEM    GCAL    AND    OBJECTIVES 

A    number   of    DCSPLANS'    uni]U€   operational    characteristics 
must    te   considered    when   for  inula  ting    the    system's    goal    an! 
its    key   objectives.       These    critical    aspects    are    uncover:1 
duriny    Phases   I    and   II    of    the    preliminary    development 
activity    (presented    in    the   previous    chapter),     ind    arc    used 
to    create    the    Phase    III    deliverables    illustrated    in    this 
chapter     (System    load/Objectives    una    Structure   Diagram    with 
Narratives).       A    sample   oc    the    DCSPLANS    characteristics 
having    the    greatest    inpact    on    the    general    system    design    are 
presented    below. 

The    most    important    fact    i.;    that    DCSPLANS    personnel    have 
little    faith   in    the    accuracy    of    input    .lata    they    are 
receiving    from    a    variety    of    verj    large    source    files    prepared 
and    maintained    by    elements   outside    their    span    of    control. 
At    the    present    time,    DCSPLANS    Ices    not    possess    the 
capability    to    validate    this    questionable    input    lata.       Thej 
are,    however,    extremely    worried    about    the    adverse    impact    cf 
such    input    data    on    the    validity    of    model    outputs. 

Input    source    files    provide    crucial    data    ta    DCSPLANS' 
force    alignment    nodels.       Each    of    the    files    reeds    a    varying 
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number  of  models,  and  supplies  a  unique  set  or  lata  elements 
depending  or,  the  particular  aiodel  involved.   Generally,  the 
data  elements  contained  in  the  source  files  an  i  the  data 
elements  required  by  the  models  remain  the  same,  creatinj 
relatively  good  systei  stabiity  in  this  regarl.   There  art, 
however,  occasional  chanjos  made  in  the  data  elements 
provided  or  required.   A  DCSPLANS  validation  tool  must 
provide  the  flexibity  to  incorporate  such  changes  easily. 

In  many  cases,  aczlels  using  the  same  data  elements  from 
the  same  source  file  require  different  degrees  of  validation 
(e.g.,  the  validity  of  input  lata  element  "A"  from  the 
Enlisted  Master  File  may  be  crucial  tj    the  validity  of 
Personnel  Readiness  Indicator  Model  output,  but 
inconsequential  to  the  validity  of.    output  produced  by  the 
Personnel  Policy  Projection  'lodel  (?3.v:)  )  .   Thus,  i  DCSPIANS 
validation  tool  must  he  able  to  differentiate  between  the 
validation  required  for  Znlistec  ."laster  File  data  when  used 
ly  the  Personnel  Readiness  Indicator  Model  as  opposed  to  the 
P3M,  and  it  must  apply  edit  an  J  validation  rules 
accordingly. 

Generally,  DCSPLANS'  models  are  run  on  a  standard 
schedule  which  coincides  with  required  briefings/reports  and 
which  also  facilitates  use  of  ore  model's  output  as  input 
for  another  model.   There  are,  however,  occasions  when  a 
model's  output  is  required  on  very  short  notice.   In  these 
circumstances,  the  tine  normally  devoted  to  lata  validation 
may  not  be  available,  and  the  DCSPLANS '  models  would  have  to 
be  run  in  the  quickest  possible  time  without  regard  to  data 
integrity.   While  such  a  proceuure  seems  unwise,  it  may 
occur,  and  the  DCSPLANS  validation  to)l  must  provide  for 
such  a  contingency  by  allowing  itself  to  be  circumvented  if 
required.   In  this  regarl,  the  CCSPLANS  data  filter  cannot 
be  a  mandatory  part  of  any  integral  data  extraction  or 
modeling  process. 
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The   majority    of    ECSPLANS    moceling  activities    uill    I 

done    in    a   batch    mode.      The   extraction    of    pertinent    data    fi 
large    input    files    is    jlso   a    batch    process     (e.  ;.  ,    the 
"UTRACS"   program   developed   and    use!    by   DCSPLANS    to    extrict 
pertinent   data    from    the   Enlisted   Master    rile).      :;owovor, 
capabilities    to    manipulate    data    dictionary    metadata    on-li: 
and    to    query    the    metauatabase    on-line  are    crucial    to 
effective,    user-  cr ienlly    operation    of    the    data    filter 
system.       All    other     lata    filter     jrocesses     (e.j.,     EVI 
formulation)    will   be    done    in   hatch    mode    to    insure    run-ti 
ef  f iciency . 

Easel    upon    an    examination    of    the   overall    DCSPLANS 
situation,    and    keying   on    the    points    just    mentione 3 ,    the    joal 
of    the    DCSPLANS    data    filter    system    is    to    validate   all 
externally    provided    input    data    use  1    by   DCSPLANS'    force 
alignment    models    in    consonance    with    established    DCS?! 
quality    control    standards. 

Key    objectives    of    the    DCSPLSNS    data    filter   system    are: 

1.  It    must    be    compatible    with    the   existing   DCSPLANS 
computer    system   configuration. 

2.  It    must    allow    flexible    and    easy    additions    and    updates 
to    the    metada tabase. 

3.  Its    interface    with    th<2    data    extraction   and    modeling 
processes   must   be    optional    (at    the    discretion    of    th  • 
Chief,    DCSPLANS;    otherwise    it    will    he    an    automatic, 
mandatory    interface)  . 

H .       It    must    provide    for    the    automatic    adjustment    of    edit 
and    validation    rules    to    suit    the    particular    source 
file   and    model   being    processed. 

5.  It    must    provide   an    interactive    on-line   query    facility 
for    accessing    the    me  tada  tanas  j. 

6.  It    must    providt   an    error /status   report    jeneraticn 
facility. 

7.  Jt    must    be   a    user-friendly   system. 
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9..      System  development  and  i  np  lemon  tat  ion  costs  must  be 
consistent  with  the  "local"  nature  of  the  system.   A 
conservative  approach  is  desired. 


B.   "DATA  FILTER"  STRUCTURE 

This  section  uses  a  structure  diagram  (in  modified 
format)  to  set  forth  the  proposed  structure  of  the'  DCSPLAN3 
"data  filter"  system  software.   The  structure  is  derived 
from  a  functional  decomposition  process  in  which  major 
system  functions  are  split  successively  into  sets  of 
sub- functions.   The  proposed  DCSPLANS  system  will  be 
decomposed  to  three  levels.   This  decomposition  demonstrates 
the  hierarchical  control  structure  and  relationships  of 
modules  which  comprise  the  overall  "data  filter"  pro-gran:. 
It  does  not  represent  any  particular  processinj  sequence  or 
order  of  decision- making.   [Ref.  13:p.  149] 

"he  structure  diagram  is  normally  presented  in  the 
graphical  format  shown  in  Figure  4.1.   However,  due  tc  the 
crowding  effect  that  will  occur  from  a  three-level 
decomposition,  the  major  system  functions  (level  1)  and 
subordinate  modules  (levels  2  and  3)     aro  represented  here  in 
paragraph/sub-paragraph  format  (See  Figure  5.1).   Modules 
depicted  in  this  manner  are  easily  transferred  to  a  graphic 
representation  of  the  overall  system,  if  required. 

"!  •   Str uct  are  Diagram 

The  proposed  data  filter  system  contains  five  major 
functions  (Control  Data  Filter  System,  .Maintain 
MetadatdLase,  Produce  EVS,  Validate  Input  Data,  Generate 
Reports).   The  system's  hierarchical  structure  is 
illustrated  below,  followed  by  descriptions  of  each  aajcr 
function,  sur -function,  and  lower  level  module. 
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DCSPLANS    rata    Filter 
1.0      Fir  ST    MAJOR    FUNCTION     (Level    1) 

1.1  First    Sab-function    cf    1.0    (Level    2) 

1.1.1  First    Module    of    1.1      (Level    3) 

1.1.2  Second    Module    cf    1.1     (Level    3) 

1.1.3  Third    Module    of    1.1     (Level    3) 

1.2  Second    S ul-f unction    of    1.0    (Level    2) 
1.2.1       First    Molule    of    1.2     (Level    3) 

2.0       SECOND    MA  JOB    FUNCTION     (Level     1) 

2.1  First    Sub-function    cf    2.0     (Level    2) 

2.1.1  First    Nodule    of    2.1     (Level    3) 

2.1.2  Second    Module    cf    2.1     (Level    3) 

2.2  Second    Sut-functicn    of    2.3     (Level    2) 

2.3  Third   Sufc-f unction    cf    2.0     (Level    2) 

2.3.1       First    Module    of    2.3     (level    3) 

(FTC.) 


Figure  5.1    Sample  Paragraph  Format 


1.0   CONTROL  DATA  FILTER  SYSTEM 

1.1   Verify  Transaction  Validity 

1.1.1  Read  Access  and  Transa 

1.1.2  Evaluate  Codes 

1.1.3  Irapleffent    Validity   Pec 
Ptcvide    Men u/Screen 

1.2.1  Real    Validity    Decision 

1.2.2  Display   Appropriate   Sc 
Transfer    Control 

1.3.1  Real    Screen    Input 

1.3.2  Deter mine    Proper    Proct 
1.3.1       Pass    Irojr.i.u    Control 

2.0       MAINTAIN    METADATA  EASE 
2.  1      Ccntrol 
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Cdes 


1.2 
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2.1.1  Provide    Metadata  base   Menu 

2.1.2  Transfer    Control 

2.2  Add    Metadata 

2.  2.  1       Read    add    Data 

2.2.2  Check  Uniqueness 

2.2. 3  Check  Format 

2.2.4  Accept  Data 

2.3  Delete  Metadata 

2.3.1  Read  Celete  Request 

2.3.2  Locate  Metadata 

2.3.3  Remove  Metadata 

2.4  Change  Metadata 

2.4.1  Read  Change  Request 

2.4.2  Locate  Metadata 

2.4.3  Update  Metalata 
3.0   PRODUCE  EVR 

3.  1      Ccntrol 

3.2  Retrieve    Metadata 

3.2.1  Read    Source    File/Model    Codes 

3.2.2  Open     Metadata    File(s) 

3.2.3  Extract    Pertinent    Data    Values 

3.3  Formulate    EVF 

3.3.1  Load    Variables 

3.3.2  Set    Switches 
4.0       VALIDATE     INPUT     TATA 

4. 1  Ccntrol 

4.2  Select     EVF 

4.2.1  Deter  nine    Input    Record    Types 

4.2.2  Extract    Applicable   SVR 

4.3  Aptly    EVR 

4.3.1  Read    Input    Data 

4.3.2  Read    EVR 

4.3.3  Check    Parameters 

4.4  Provide    Processed    Input    Data 
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4.4.1  Read     Errar    Cole 

4.4.2  Transfer    Erroneous   Dat  i/Error    loi 

4.4.3  Transfer    Valid    Data 
4.5      Maintain    Statistics 

4.5.1  Maintain    Transaction   Count 

4.5.2  Maintain    Error    Count 

4.5.3  Sort     Error    rypes 
5.0      GENERATE    REPORTS 

5. 1  Control 

5.2  Retrieve  Re  pert/Response  Data 

5.2.1  Deter  line  Report /Response  Typ€ 

5.2.2  Read  applicable  Data 

5.3  Perform  Calculations 

5.4  Provide  Report/Response 

5.4.1  Deter  nine  Format 

5.4.2  Format  Data 

5.4.3  Transfer    to   Output    Device 


2-      Narrative   Descriptions 

The   following  are   succinct    explanations    of    the    k 
aspects   of   each    structure   diagram    function,    sub-function, 
and    module.       3ach    lower   level    description    serves    to 
refine/expand    the    detail    of    its    superior    level. 

-  1.0    CONTROL    DATA    FILTER    SYSTEM:       This    functior    controls 
access    to    the    data    filter    system   and    verifies 
transaction    validity.       It    also    provides    icr^er.s    fcr 
implementing    other    major    system    functions,    and 
transfers    control    to    these    processes. 

-  1.1    VERIFY    TRANSACTION    VALIDITY:       Ihis    sub-function 
insures    that    the    user    is    authorized    access    to    the 
system    for    the    desired    transaction,    and    that    the 
transaction    itself    is    valid     (e.j.,    an    attempt    to 
validate    the    Enlisted    Management    File    zon    use    in    tl 
Officer    Promotion    Model    would    be   rejected). 
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1.1.1  HEAD  ACCESS  AND  TRANSACTION  CODES:   This  irodule 
reads  in  the  user's  access  cole  and  the  transaction 
codes  indicating  the  desired  process  and  the  source 
input  file/node  1  (s)  involve!. 

1.  1.2  EVALUATE  CODES:  This  module  checks  user-supplied 
codes  against  authorized  access  and  transaction  codes. 
1.1.3  IMPLEMENT  "VALIDITY  DECISION:  This  aodule  will 
either  reject  the  transaction  or  pass  an  indication  cf 
a  valid  transaction  to  module  1.2.1.  This  module  also 
sets  restrictions  within  authorized  processes  (e.g.,  a 
user  may  be  allowed  to  add  metadata,  but  not  change  cr 
delete  existing  metadata) . 

1.2  PROVIDE  MENU/SCREEN:   This  sub- function  provides 
the  user  with  the  appropriate  screen  for  continued  use 
of  the  system. 

1.2.1  READ  VALIDITY  DECISICN:   This  aodule  reads  the 
validity  indicator  produced  by  module  1.1.3. 

1.2.2  DISPLAY  APPROPRIATE  SCREEN:  This  module  causes 
either  a  menu  or  screen,  as  appropriate,  to  appear  on 
the  monitor. 

1.3  TRANSFER  CONTROL:   This  sub-function  passes  control 
tc  an  appropriate  system  module  in  response  to  user 
input. 

1.3.1  READ  SCREEN  INPUT:   This  module  reads  user 
responses  to  teririnal  prompts* 

1.3.2  DETERMINE  PROPER  PROCESS:   This  module  interpret:; 
user  input  in  terms  of  the  desire  1  system  function 
(e.g.,  update  metadata,  gecerate  report,  etc.). 

1.3.3  PASS  PROGRAM  CONTROL:   This  module  passes  control 
to  the  appropriate  system  nodule. 

2.0  MAINTAIN  METADATA3ASS:  This  function  creates  new 
met adatabase  entries,  deletes  me ta  la tabase  contents, 
and  mazes  changes  to  the  existing  metadatabase . 
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2.1  CONTROL:      This   sub- function    displays   tl 
metadatabase    menu,    and    governs    the    activation    and 

sejUer.ce    of   add,    change    and    iciete    processus. 

2.1.1  PROVIDE    METADATABASE    MEMU:       This    module    iisplays 
a    menu    jiving    the    user    options    of    adding,     leleting    or 
changing    metadata. 

2.1.2  TRANSFER    CONTROL:        This    module    tasscs    control    to 
cither    modules    2.2,    2.3,    or   2.4,    depending    on    user's 
request    and   access    authorization. 

2.2  ADD   METADATA;      This   sub-function    rea  is    metadata 
input,    checks    it    for    duplication   and    proper    entry 
format,    and    either    rejects    the    input    or    stores    it    i:. 
the   metadatabase. 

2.2.1  READ    ADD    EATA:       This    module    reads   data    which    ti  > 
user   desires      to   enter    intc   the   metadatabase. 

2.2.2  CHECK    UNIQUENESS:        This    module    checks 
metadatabase    to    insure    data    *-o    be    lided   does    not 
already    reside    there. 

2.2.3  CHECK    FORMAT:      This    nodule   checks    data    to    be 
added    for    compliance    with    prescribed    standard    metadata 
entry    formats. 

2.2.4  ACCEPT    DATA:       This    module    evaluates    results    cf 
module    2.2.2    and    2.2.3    processing,    and   either    rejects 
data    to   be    aided    or    stores    it    in    the    metadatabase. 

2.3  DELETE    METADATA:      This   sub-function    reads    metadata 
deletion    request,    locates    the    data    in    the    metadataLase, 
and    removes    it. 

2.3.1  DEAD    DELET^    EEQUE5T:      This    module    reads    the 
user's    request    tc   delete    data. 

2.3.2  LOCATE    METADATA:       7 h is    module    Locates     indicated 
iretadatu    in    the    leta  lataba se. 

2.3.3  REMOVE    METADATA:       This    module    removes    metadata 
from    the    metadatabase   after  a    re-verification    cf    tl 
user's    desire    to  delete    the    data. 
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2.4    CHANGE    METADATA:       This    sub-function    reals    a 
metadata    change    request,    locates    the    data    to    be 
changed,    and    updates    the    data    after    verification    that 
the    new   metadata    meets    the    prescribe!    entry    format. 

2.4.1  READ    CHANGE    REQUEST:       Ibis    module    reads    the 
user's    request    tc    update    existing    metadata. 

2.4.2  LOCATE    METADATA:       This    module    locates    the 
metadata    to    be    changed. 

2.4.3  UPDATE    METADATA:       This    module    replaces    old 
metadata    with    new    metadata. 

3.0    PRODUCE   EVR:      This    function    produces   edit    ar.d 
validation    rules   for    use    by   sub-function   4.3.       Metadata 
values    are    extracted    from    the    metadatabase    and    are 
transformed    into  bounded    conditio nal    statements    through 
which    input   data   will    be    run. 

3.  1  CONTROL:  This  sub-function  governs  the  activation 
and  sequence  of  rroccssea  involved  with  the  production 
of    edit   and    validation    rules. 

3.2  ACCEPT    PROCESSING    CORES:       This    sab-function    reads 
the   source    file    and    model    codes   entered    by    the    user, 
opens    appropriate    metadata    files,    extracts    applicaatle 
metadata    values,    and    stores    them    in    a    "variables"    file. 

2.2.1  READ  SOURCE  FILE/MODEL  COD3S:  This  module  reads 
the  source  file  and  model  identification  codes  entered 
earlier   by    the    user. 

3.2.2  OPEN    METADATA    FILS(S):       This    module    identifies 
and    opens    all    metadata    files    containing    data    relating 
to    source    file    an  I    models    noted    by    module    3.2.1. 

2.2.3  EXTRACT    PERTINENT    DATA    7ALU3S:        This    module 
extracts    pertinent    metadata    values    from    opened 
metadatabase    files    and    stores    the    data    in    a    "variables" 
file. 

3.3  FORMULATE    EVR:      This    sub-function    reads    the 
metadata    values    stored    in    the    "variables"    file    into    a 
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file   of   pre-established    conditional    statements,     thereby 
setting    switches   either    on   or    off   and    setting    upper 
lover    boundaries  of    acceptable    input    lata   values. 
(Setting   and    boundaries    will    therefore    vary    according 
to    the   coiLbinat icn    of    source    rile    and    model    coles 
presented    by    the   user.) 

-  3.3.1    LOAD    VARIABLES:       This    module    reads    the 
"variables"    file   into   a    file    of    pro-set   conditional 
statements. 

-  3.3.2  SET  SWITCHES:  This  nodule,  depending  on  variable 
values,  sets  swithches  either  on  or  off  and  establishes 
upper    and    lower    boundaries,   as    required. 

-  '4.0    VALIDAT7.    INPUT    DATA:       Shis    function    ictually 
performs    the    validation    Ly    selecting    specific    EVR, 
applying    these    E7R    to    tie    input    data,    and    providing    I 
processed    input    data    to   cither    a    "validated     lata"    file 
or    an    "error"    file.      This    function    also   maintains 
statistics    or.    the    number    of   data    items    processed    and 
the   number    and    category    of    errors    found. 

-  4. 1    CONTROL:       This    sub-function    governs    the    activation 
and   sequence   of    processes    involved    in    the   actual 
validation    of    input    data. 

-  4.2    SELECT    EVR:      This    suh- function    identifies    the    type 
cf    record  (s)     being    validate'!    from    the    source    file,    a;.", 
activates    only    those    EVR    which    apply.        (This 
sub-function    precludes    the    validation    program    from 
unnecessarily   running    an    input    record    past    all    source 
file   EVR,     thereby   enhancing    run-time    efficiency    of    the 
overall    process.) 


-    4.2.1     DETERMINE    INPUT    RECCFD    TY^J: 


■v  i 


no  l  u I e 


identifies  the  subset  of  recorls  that  are  being 
validated  from  the  source  input  file. 
-  4.2.2  EXTRACT  APPLICABLE  EVR:   This  module  extracts 
only  those  EVR  applicable  to  the  record  types  beii 
v  u  lidated . 
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4.3  APPLY    EYE:       This    sub-function    redds    the    input    data 
and  its  associated    EVR,    and   compares    then   to   verify 

co ipli  ance. 

4.3.1  READ  INPUT  DATA:   This  module  sequentially  reals 
in  pat  data  to  he  validated. 

4.3.2  F.EAD  EVR:   This  module  reads  EVR  from  module 
4.2.2. 

4.3.3  CHECK  PARAMETERS:   This  nodule  compares  input 
data  to  EVR  parameters,  assigning  an  appropriate  error 
code  (including  "no  error"). 

4.4  PROVIDE  PROCESSED  INPUT  DATA:   This  sub-f uncticn 
reads  the  processed  data  and  its  error  code,  and 
transfers  the  data  accordingly. 

4.4.1  READ  ERP.OF  CODE:   This  module  reads  the  lata  3nd 
associated  error  code  from  moiule  4.3.3. 

4.4.2  TRANSFER  ERRONEOUS  DATA/EEROF  CODE:   This  module 
transfers  erroneous  data  with  its  associated  error  ccie 
tc  an  "error"  file. 

4.4.3  TRANSFER  VALID  DATA:   Tnis  module  transfers  all 
valid  input  data  to  a  "validated  lata"  file. 

4.5  MAINTAIN  STATISTICS:   This  sub- f u net  ion  maintains  ^ 
running  count  cf  the  number  of  transactions  processed 
and  the  number  and  type  of  errors  found. 

4.r.1  MAINTAIN  TRANSACTION  COUNT:   This  module 

maintains  a  cunning  count  cf  the  number  of  transactions 

processed  in  a  valiiatior.  activity. 

4.5.2  MAINTAIN  ERROR  COUNT:   This  module  counts  the 

rusher  of  errors  found  and  notes  the  error  code 

in  vcl ved. 

4.^.3    SORT    ERROR    TYPES:        This    moiule    sorts    a    validation 

activity's    error  count    by     type    of    error. 

5.0    GENERATE    REPORTS:       This    function    accepts    requests 

for    both    printed    reports    and    interactive    (terminal) 

responses,    determines   and    retrieves    the    appropriate 
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report/response    lata,    performs    calulations   ai 
formatting    as   required,    ar.c   issues    the    requested 

refcrt/response . 

5.1  CONTROL:      This   sub- function    governs    the    actival 
and   sequence   of    processes    involved    with    f.;.  •    production 
of    printed    reports    and    interactive    response    to    terminal 
queries. 

5.2  BETFIciVZ    BEPCRT/RESPCNSE    DATA:       This    sub-f unction 
determines    the    type    jf    repcrt/response    desired    ana 
reads    required    data    from    appropriate    files. 

5.2.1  DETERMINE    REPORT/RES EONSE    TYPE:       This    module 
interprets    the    user   request    for    information    in    tern's   of 
repcrt/response    content. 

5.2.2  READ    APPLICABLE    DATA:      This    nodule    locates,     :     id 
and   temporarily    stores    the   data    needed    for    the 
requested    rep  or  t / re  soon  s e . 

5.3  PERFORM    CALCULATIONS;       This    sub-function    determines 
whether   calculations    are    required    to    produce    desire- 
information,    and   if    so,    it    reads   the    appropriate    data 
and    perforins    the   required    operations,    producing    "new" 
repcrt/reponse    data. 

5.4  PROVIDE    REPCET:      This    sub-function    determines    t 
appropriate    repcrt/response    format,     Formats    the    data 
accordingly,    and    transfer    the    formatted    data    to    th€ 
appropriate    output    device. 

5.4.1  DETERMINE    EEP02T    FORMAT:       This    module    determir 
the   format    required    for    the   desired    response    Ln 
accordance    with    pre-established    format    parameters. 

5.4.2  FORMAT   DATA:      This    module   arranges    lata    in   proper 
format. 

5.4.3  TRANSFER    TC    OUTPUT    DEVICT:       liiis    nodule    uar,f:fr^ 
the    formatted    data    to    the    appropriate    output    device. 
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C.   "EATA  FILTER"  IMPLEMENTATION 

"wo  key  advantages  inherent  in  the  proposed  local  data 
validation  system  concept  are  lew  development  costs  and 
speedy  ijrple mentation-   In  this  light,  initial  DCSPLANS 
development  efforts  nust  focus  en  the  creation  of  a 
prototype  system  that  takes  maximum  advantage  of  existing 
resources.   Specifically,  the  DCSPLANS  prototype  must 
incorporate  the  existing  CJTSACS  program  which  extracts 
relevant  Enlisted  Master  Pile  (F.MF)   lata,  the  existing  E3ASE 
II  data  dictionary  which  currently  includes  general  model 
and  office  metadata  in  its  rue  t  a  da  tabase,  and  the  existing 
DCSPLANS  IBM  PC  microcomputer.   The  DCSPLANS  local  lata 
filter  system  therefore  will  consist  of  an  IBS  PC  based, 
D3ASE  II  program  which  filters  EMF  input  data  for  use  ir  two 
application  models  (two  models  ffust  be  used  to  test  the 
system's  ability  to  differentiate  between  the  degrees  cf 
validation  require!  by  separate  models  using  the  same  input 
data  source  file) . 

The  following  steps  suggest  a  methodology  for 
development  o:  the  ir.itial  DCSPI&NS  prototype  "data  filter" 
system. 

1.  Determine  and  implement  the  proper  interface 
mechanism  for  feeding  UTBACS  extracted  E-17  data 
through  the  I  EM  PC  data  filter  system. 

2.  Expand  current  data  dictionary  capabilities  by 
creating  additional  metacatabase  modules  which  will 
accept  and  store  me ta  lata  about  source  file  and  model 
data  elements.   Create  ar  addtional  data  lictionary 
metadatabase  module  that  will  accept  and  store  EVP. 

3.  3sinj  the  Phase  II  checklist  from  chapter  four, 
comprehensively  construct  lata  definitions  for  EMF 
and  model  data  elements,  and  create  the  L73  metadata 
which  sets  data  element  validation  parameters  and 
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interrelationships.      This    step   must    b  i  accomplish*    . 
with    the    full,   constant    coot)erdtio:.    of    tl  ose    DCS?] 
personnel    most   closely    acquainted    with    the    EMI     n  ! 
the    two   application   models   being    used    for    the 

p ro to type. 

4.  Load    the   data    definition   and    EVE    metadata    into    tl 
data    dictionary   metadata  rase. 

5.  Using    the    functional    modules    iron    section    3    of    this 
chapter    as   a    guide    (particularly    function    '4.1), 
create    an    edi t/valida tier    program    rfhich    will    control 
and    implement    the   ovenll    data   filter   process. 

The    development    methodology    presented    above    is    based 
upon    a    limited    on-site    review    cf    dCS°L\'i<r>    operations. 
more    comprehensive    examination    cf    the    DCS? LAM    environment 
(Sea    Fhase    I   of    the    planning    anc    initiil    design    process 
described    in   chapter    four)     will    most    likely    uncover    some 
additional    requirements    and    necessary   adjustments. 
Therefore   a    ietaiied    on-site   environmental    review   is     in 
essential    prerequisite    to   any    DCSPLAWS   data    filter 
development/implementation    effort,    especially   'jus    being 
undertaker    by    non-DC SPLAVS    personnel. 
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VI.     CONCLUSIONS    AND    REC03HEKDATIOHS 


A.        CCNCIUSIONS 

DCSPIAN5,    MILPERCEN    suffers    from    a    lata    control    problem 
common    tc    many    small    user    groups    in    large    data    processing 
systems.      It    is    unable    to    verif}    the    correctness    of    input 
data    obtained    from   sources   outside    its    span    o£   control.       At 
the    present    time,    DCSPLANS    must    rely    almost   exclusively   on 
the    the   competence    of    its    outside    sources    to   Guarantee    the 
integrity   of    its    input   data.       The   situation    is   causing 
DCSPLANS1    managers    a    great    deal    of    concern. 

Top-level    Army    decision-makers    use    output    from    DCSPLANS' 
applications   to    formulate   long-range    personnel    management 
policies.      Thus,    the    adverse    intact    of    erroneous    input     lata 
entering    DCSPLANS'    models    can    be    far-reaching   and    extremely 
serious.       Despite    this    fact,    DCSPLANS1    small    size    relative 
to    the    overall    MILPERCEN    information    processing    system 
precludes    it    from    strongly    influencing    the    adoption    of    a 
system- wide    validation   capability.       DCSPLANS    must    therefore 
develop    and    implement   a    "local"    solution    to    its     lata 
validation    problem. 

DCSPLANS'    models    and    their    associated    input    source    files 
contain    many   of    the    same    data    items.       Additionally,    a 
variety    of    relationships    exists   among    the    input    data.       This 
situation    demands    that    DCSPLANS'    use    a    variety   of    validation 
techniques    to    insure    the    accuracy    of    data    used    by    its 
models.       In    addition    to    routine   format   checks,    a    series   of 
reasonableness    checks    are    also    needed    to    guarantee    that 
input    is   both    complete   and   consistent.       Reasonableness 
checks    are    more    complex    than    fhe    format    checks,     and    are,    in 


fact,    the   r^al    key    to   insuring    a    truly   integrated    validation 
process    (i.e.,    data    elements,    records   ind    files    ace    not    snly 
valid    by   themselves,    Lut   also    in   relation    ho    Jther   relevant 
elements,    records   and    files).       Cf    course,    validation    of    I 
legality   and   proper    sequencing    cf    in    input    activity    itself 
must    precede   the    validity   checks   on    the    data. 

An    ideal   vaiidaticn   tool    for    DCSPLANS    is   the    active    lata 
dictionary.      Configured    as  a    lata    filter,    the   dictionary 
provides  a    flexible,    user-friendly,    easily    expandable 
validation    system    for  a    "small"    user    jroup.      The    data    fill 
can    he    developed    locally    using    the   expertise   currently 
available    within   DCSPLANS.         Such    local    development    allows 
the    data    filter    system   to   be    tailored   precisely    ^  o    DCSPLANS' 
own    validation    needs.      The    data    dictionary    approach    penits 
guick,    easy    adaptation    of    the    data    filter    to    changes    in 
models    and    input   data   source    files    by    simply    alj  is  tin j 
dictionary    metadata.      No   extensive    validation    program 
re-writes    will    be    required.       Alio,    the    use    of   a    metadatabase 
as   a    single   source    of   data    for    building    EVE    provides    a 
ready-made    mechanism    for   keeping   the    S7S    consistent. 
Lastly,    an    active    data    dictionary    allows    DCSPLANS    to     leveloj 
future    data    processing    tools/ca labilities    with    relative    ease 
and    minimal   investments    of    time    an  1    money. 

Preliminary    planning    is    crucial    to    DC3PLA3S1     successful 
development    of    the    data    filter.      The    overall    DCSPLANS    data 
processing    environment    must    be    understood,    and    -lata 
definition    and    associated    validation    requirement s    luust 
comprehensively    examined    and    carefully    locumentei.       Thorough 
accomplishment    of    these    first    two    phases    of    ievelojeient 
will    provide   a    solid    base    for    bet;,    preliminary  and    detailed 
system    design.       Preliminary    design    should    be    iccomplished 
through    a    functional    decomposition    of    major    system 
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functions.   These  major  functions  mu 

analysis  of  ph. ase  one  and  two  results,  a:. A    must  satisfy  tie 
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achievement    of    the    specific    goals    and    key    objectives    of    the 
DCSPLANS   system. 


E.        RECOMMENDATIONS 

An   effective    DCS  FLANS    approach    to    its    data   validation 
problem    must    key    on    the   concepts/designs    presented    in    this 
thesis.       It   is    recommended    that: 

1.  DCSPLANS    pursue   an    efficient    "local'1    solution    which 
can    be    tailored    tc    its    specific   needs,    rather    than 
await   or    attempt    to   influence    the    adoption    of    an 
organization- aide    validation    system. 

2.  the    local   solution    applied    by    DCSPLANS    be    an    active 
data    dictionary    "data    filter." 

3.  DCSPLANS    begin   development    with    a    prototype    system 
that    will    validate    Enlisted    Master    File     (SMF)     data 
for    use    in    two    models.       This    approach    tests    the 
system's    ability    to    differentiate    between    the    degrees 
of    validation    required    bj    different    models    using    the 
sane    source    data    file,    a  rd   also    takes   advantage    cf 
the    existing    GTPACS    program     (extracts    relevant    EV.F 
data).       The    prototype    should    use    a:,    easy- to- program, 
easy-to-use    relational    database    management    system 
with    a   simple    query    language    facility    (similar    to 
CEASE    II)  . 

4.  DCSPLANS    appoint    a    small    project    team    to    oversee    the 
data    filter    development.       The    team    must    conduct    a 
thorough    on-site   review    cf    DCSPLANS    environmental 
characteristics    an  1    data    definition/validation 
criteria     (Chapter    Four)     prior    to    revisions    of    the 
general    design    (Chapter    Five)     and    subsequent    coding. 
While   detailed   design    and   coding    can    be   conducted 
off-site     (perhaps   as    a    thesis    project),    the    review    ol 
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environmental  character i sties,  data  definition,  and 
validation  criteria,  must  be  accomplished  at  DCSPL/ 
by  personnel  familiar  with  DCSPLANS  oporati  u  . 
checklists  in  chapter  four  provide  compr<  en  ive 
guidelines  for  such  an  examination. 
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